Personnel
Overall Objectives
Research Program
Application Domains
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Auto-tuning at Run-time with Multiple Implementations of OpenMP Tasks

Participants : Luis Felipe Garlet Millani, Lucas Mello Schnorr [UFRGS, Brazil] , Jean-François Mehaut.

OpenMP established itself as the de facto standard for parallel programming in shared memory environments. It received many additions over the years enabling OpenMP to be used with heterogeneous systems. We propose an extension to the task pragma of OpenMP allowing it to provide multiple ways to compute the desired result. The run-time can thus be provided with implementations with different trade-offs.

With the use of the BOAST 5.7 auto-tuning framework, these implementations can be generated automatically before the execution. But within this framework, the auto-tuned kernel is selected in an environment different from that of an actual execution of the application. As a consequence, it may be the case that no interactions occur between different tasks during the auto-tuning, while, in the actual execution, tasks do affect each other due to shared resources like cache or memory bandwidth: Kernel selection done in isolation during the auto-tuning process is probably not the best choice for the embedded execution as part of the full application.

We propose dealing with this limitation by having the auto-tuning phase select not a single but a set of implementations, to be later further selected during execution. Our approach also permits the tuning of different parameters (such as memory accesses and number of operations), and allows to use whichever implementation is more adequate for the thread based on monitored load.

Our extension is implemented within the LLVM framework and Clang compiler front-end. Furthermore we extend the LLVM OpenMP Run-time to be aware of the multiple task implementations. We verify the efficacy of our proposal with the Ondes3D seismic wave simulator and a sparse matrix multiplication application.